Practical - Week 4
(Department of Spatial Sciences)
2025-10-20
A brief introduction to Species Distribution Modelling (SDM)
Tool that aims to predict where species could potentially be located from a limited set of observations.
It can also used to estimate a species’ niche from its distribution.
It’s a huge and popular field, mainly used in quantitative ecology and conservation.
There are multiple software packages and tutorials, facilitating its implementation.
Examples: sdm, dismo, usdm, ecospat, biomod2, etc.
It’s also a recent field, that quickly changes and advances, and it’s full of problems and challenges.
What data do SDMs require?
How do SDMs work?
SDMs relate the biodiversity observations to the environmental data using a variety of algorithms.
Once this relationship has been modeled, they can predict:
Future distributions
Areas where the species might be already
Areas suitable for (re-)introduction
Distribution of alien species
Different SDM algorithms need different types of occurrence data. These are the three main approaches:
Ensemble methods:
If we perform many models, which one do we choose?
We could choose the one that’s best suited to our data, or that performs the best.
However, a more popular approach is performing ensemble models.
In this approach, predictions from multiple models are combined or averaged to produce a single model.
The most frequently used package is biomod2.
ODMAP protocol (Zurell et al. 2020)
Which are our research objectives?
Which taxa are we working with? Where is our location? And our scale?
What data is available?
Obtain:
What kind of biodiversity data do we have?
Ensure the temporal and spatial scale of the biodiversity and environmental data match
Clean biodiversity data of unreliable observations - centroids - outliers - duplicates - institutions
Generate pseudo-absences (if necessary)
Remove collinear environmental variables
12 variables from the 19 input variables have collinearity problem:
wc2.1_30s_bio_16 wc2.1_30s_bio_17 wc2.1_30s_bio_19 wc2.1_30s_bio_18 wc2.1_30s_bio_6 wc2.1_30s_bio_4 wc2.1_30s_bio_1 wc2.1_30s_bio_12 wc2.1_30s_bio_10 wc2.1_30s_bio_7 wc2.1_30s_bio_5 wc2.1_30s_bio_15
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( wc2.1_30s_bio_9 ~ wc2.1_30s_bio_8 ): 0.04396524
max correlation ( wc2.1_30s_bio_13 ~ wc2.1_30s_bio_2 ): -0.6214331
---------- VIFs of the remained variables --------
Variables VIF
1 wc2.1_30s_bio_2 3.850564
2 wc2.1_30s_bio_3 2.702829
3 wc2.1_30s_bio_8 2.756529
4 wc2.1_30s_bio_9 2.297982
5 wc2.1_30s_bio_11 6.555181
6 wc2.1_30s_bio_13 2.535430
7 wc2.1_30s_bio_14 2.942497
Remove collinear environmental variables
Spatial thinning - Keep only one presence / absence per environmental raster cell
Separate data into training (modelling) and testing
$initial
[1] 3024
$kept
[1] 1136
$out
[1] 1888
Model selection:
Single model? Which?
Ensemble? How to average over models?
Which model settings should we use? % testing vs % training, number of cross validation folds (separations), number of repetitions per fold (robustness vs. computational time)
If we want to produce binary predictions, which threshold should we use?
It all depends on our data and our objectives
Exploration of response curves
Assessment of model coefficients and variable importance
Do our results make sense?
Model performance metrics:
AUC: area under the receiver operating characteristic curve (closer to 1 better, but beware! very high values might indicate overfitting)
TSS: true skill statistic
Sensitivity: true positive rate
Specificity: true negative rate
You want to balance the correctly predicted presences and correctly predicted absences, depending on what your goal is
Map the potential distribution obtained from the modelling phase.
Into different space and time.
Underlying assumptions:
Species are at equilibrium with the environment
Species and environment are well sampled
We are considering all primary factors determining species distributions
The observation process can also bias our results
Introduction to species distribution modelling (SDM) in R by Damaris Zurell
Species distribution modelling practicals (Macroecology and global change course) by Damaris Zurell
ENM2020: A Free Online Course and Set of Resources on Modeling Species’ Niches and Distributions led by Town Peterson (Youtube playlist, Schedule and PDFs, Publication)
Best Practices in Spacies Distribution Modeling: A workshop in R by Adam Smith
Species distribution modeling in R by Robert Hijmans and Jane Elith
A very brief introduction to species distribution models in R by Jeff Oliver
SDM course by Bob Muscarella (very useful for finding other resources!)
SDM course by Richard Pearson at UCL